Quick initial exploration of main characteristics from the German skills matrices. The dataset used in this case is the ‘occ_SUF.DTA’, a 3-digit occupation matrix.
Reading the data into a network object:
library(foreign)
library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
# read data
data_occ_SUF <- data.frame(read.dta("../data/MR_04-17_EN_data_GermanySkillsMatrix/occ_SUF.DTA"))
# select edges where the flow is higher than random
data_occ_SUF_df <- data_occ_SUF[data_occ_SUF$SRt>0,]
Top 20 flows:
library(knitr)
top_occ_SUF_df <- data_occ_SUF_df[order(-data_occ_SUF_df$SRt),]
kable(top_occ_SUF_df[top_occ_SUF_df$SRt!=1,][1:20,])
| occ_SUF_1 | occ_SUF_2 | SRt | |
|---|---|---|---|
| 13070 | Home wardens, social work teachers | Nursery teachers, child nurses | 0.9806 |
| 13189 | Nursery teachers, child nurses | Home wardens, social work teachers | 0.9775 |
| 1089 | Special printers, screeners until printer s assistants | Type setters, compositors until printers (flat, gravure) | 0.9732 |
| 12226 | Physicians until Pharmacists | Dietary assistants, pharmaceutical assistants until medical laboratory assistants | 0.9713 |
| 12702 | Dietary assistants, pharmaceutical assistants until medical laboratory assistants | Physicians until Pharmacists | 0.9682 |
| 12584 | Nursing assistants | Nurses, midwives | 0.9624 |
| 970 | Type setters, compositors until printers (flat, gravure) | Special printers, screeners until printer s assistants | 0.9609 |
| 12707 | Dietary assistants, pharmaceutical assistants until medical laboratory assistants | Medical receptionists | 0.9511 |
| 12674 | Dietary assistants, pharmaceutical assistants until medical laboratory assistants | Publishing house dealers, booksellers until service-station attendants | 0.9507 |
| 12944 | Social workers, care workers until religious care helpers | Nurses, midwives | 0.9503 |
| 11981 | Musicians until scenery/sign painters | Artistic and assisting occupations (stage, video and audio) until performers, professional sportsmen, auxiliary artistic occupations | 0.9457 |
| 8230 | Biological specialists until physical and mathematical specialists | Chemical laboratory assistants until photo laboratory assistants | 0.9449 |
| 12465 | Nurses, midwives | Nursing assistants | 0.9448 |
| 12100 | Artistic and assisting occupations (stage, video and audio) until performers, professional sportsmen, auxiliary artistic occupations | Musicians until scenery/sign painters | 0.9443 |
| 12949 | Social workers, care workers until religious care helpers | Home wardens, social work teachers | 0.9426 |
| 6292 | Goods painters, lacquerers until ceramics/glass painters | Painters, lacquerers (construction) | 0.9422 |
| 12826 | Medical receptionists | Dietary assistants, pharmaceutical assistants until medical laboratory assistants | 0.9414 |
| 12468 | Nurses, midwives | Social workers, care workers until religious care helpers | 0.9406 |
| 13068 | Home wardens, social work teachers | Social workers, care workers until religious care helpers | 0.9394 |
| 12588 | Nursing assistants | Social workers, care workers until religious care helpers | 0.9371 |
#build the graph
occ_SUF_net <- graph_from_data_frame(data_occ_SUF_matrix)
# network object
occ_SUF_net
IGRAPH ddfe691 DN-- 120 3773 --
+ attr: name (v/c), SRt (e/n)
+ edges from ddfe691 (vertex names):
[1] Farmers until animal keepers and related occupations->Farmers until animal keepers and related occupations
[2] Farmers until animal keepers and related occupations->Gardeners, garden workers until forest workers, forest cultivators
[3] Farmers until animal keepers and related occupations->Miners until shaped brick/concrete block makers
[4] Farmers until animal keepers and related occupations->Wood preparers until basket and wicker products makers
[5] Farmers until animal keepers and related occupations->Agricultural machinery repairers until precision mechanics
[6] Farmers until animal keepers and related occupations->Butchers until fish processing operatives
[7] Farmers until animal keepers and related occupations->Cooks until ready-to-serve meals, fruit, vegetable preservers, preparers
[8] Farmers until animal keepers and related occupations->Wine coopers until sugar, sweets, ice-cream makers
+ ... omitted several edges
V(occ_SUF_net) # The vertices of the "occ_SUF_net" object
+ 120/120 vertices, named, from ddfe691:
[1] Farmers until animal keepers and related occupations
[2] Gardeners, garden workers until forest workers, forest cultivators
[3] Miners until shaped brick/concrete block makers
[4] Ceramics workers until glass processors, glass finishers
[5] Chemical plant operatives
[6] Chemical laboratory workers until vulcanisers
[7] Plastics processors
[8] Paper, cellulose makers until other paper products makers
[9] Type setters, compositors until printers (flat, gravure)
[10] Special printers, screeners until printer s assistants
+ ... omitted several vertices
The proportion of present edges from all possible edges in the network.
edge_density(occ_SUF_net, loops=F)
[1] 0.2642157
The proportion of reciprocated ties.
reciprocity(occ_SUF_net)
[1] 0.9050096
Trasitivity (clustering): measures that probability that adjacent nodes of a network are connected. In other words, if i is connected to j, and j is connected to k, what is the probability that i is also connected to k?
transitivity(occ_SUF_net, type="global") # net is treated as an undirected network
[1] 0.5701134
Diameter (length of the shortest path between two nodes) in the network, get_diameter() returns the nodes along the first found path of that distance.
diameter(occ_SUF_net, directed=F, weights=NA)
[1] 3
get_diameter(occ_SUF_net, directed=F, weights=NA)
+ 4/120 vertices, named, from ddfe691:
[1] Gardeners, garden workers until forest workers, forest cultivators Farmers until animal keepers and related occupations
[3] Agricultural machinery repairers until precision mechanics Mechanical engineering technicians
Node degrees: the number of adjacent edges to each node.
deg <- degree(occ_SUF_net, mode="all")
# Ocuppations with higher incoming/outgoing flows
print (sort(deg)[1:10])
Nurses, midwives Home wardens, social work teachers
26 27
Others attending on guests Forwarding business dealers
28 30
Nursery teachers, child nurses Cost accountants, valuers until accountants
31 33
Stenographers, shorthand-typists, typists until data typists Social workers, care workers until religious care helpers
33 34
Type setters, compositors until printers (flat, gravure) Bank specialists until building society specialists
36 37
# Ocuppations with lowest incoming/outgoing flows
print (sort(-deg)[1:10])
Chemical plant operatives
-110
Assistants (no further specification)
-109
Iron, metal producers, melters until semi-finished product fettlers and other mould casting occupations
-105
Goods examiners, sorters, n.e.c.
-104
Motor vehicle drivers
-104
Street cleaners, refuse disposers until machinery, container cleaners and related occupations
-104
Plastics processors
-102
Metal workers (no further specification)
-102
Other assemblers
-99
Miners until shaped brick/concrete block makers
-97
hist(deg, main="Network Node degree",)
Degree distribution
deg.dist <- degree_distribution(occ_SUF_net, cumulative=T, mode="all")
plot( x=0:max(deg), y=1-deg.dist, pch=19, cex=1.2, col="orange", xlab="Degree", ylab="Cumulative Frequency")
Strength is a weighted measure of degree that takes into account the number of edges that go from one node to another.
In this example we use the mode “out”, showing the number of job changes leaving the occupation.
sort(strength(occ_SUF_net,mode="out"))[1:5]
Nurses, midwives Home wardens, social work teachers
14 14
Others attending on guests Forwarding business dealers
14 15
Social workers, care workers until religious care helpers
15
sort(-strength(occ_SUF_net))[1:5]
Chemical plant operatives
-110
Assistants (no further specification)
-109
Iron, metal producers, melters until semi-finished product fettlers and other mould casting occupations
-105
Goods examiners, sorters, n.e.c.
-104
Motor vehicle drivers
-104
hist(centr_degree(occ_SUF_net, mode="in", normalized=T)$res, main='Centrality',xlab="Node Centrality")
centr_degree(occ_SUF_net, mode="in", normalized=T)
$res
[1] 59 76 97 96 110 92 102 91 36 72 90 105 87 61 54 76 93 55 57 52 71 83 69 69 70 73 70 60 56 79 50 47 76 99 102
[36] 78 72 73 69 52 90 55 48 64 38 46 72 51 56 70 66 66 62 104 83 109 63 85 43 49 44 48 50 59 46 64 70 81 60 57
[71] 38 39 64 43 53 37 46 30 57 49 104 55 41 63 59 96 69 40 55 33 42 64 75 33 57 81 70 54 61 57 52 45 40 26 37
[106] 43 49 34 27 31 57 61 64 67 44 28 57 64 73 104
$centralization
[1] 0.3959384
$theoretical_max
[1] 14280
Eigenvector (centrality proportional to the sum of connection centralities), is a measure of being well-connected connected to the well-connected. Values of the first eigenvector of the graph matrix.
hist(centr_eigen(occ_SUF_net, directed=T, normalized=T)$vector, main='Eigenvector Centrality',xlab="Node Eigenvector Centrality")
Closeness (centrality based on distance to others in the graph), measures how many steps are required to access every other node from a given node. It’s a measure of how long information takes to arrive. Higher values mean less centrality. Inverse of the node’s average geodesic distance to others in the network.
centr_clo(occ_SUF_net, mode="all", normalized=T)
$res
[1] 0.5748792 0.5509259 0.6071429 0.6010101 0.6648045 0.6230366 0.6165803 0.6010101 0.5265487 0.5891089 0.5586854 0.6134021 0.5776699
[14] 0.5483871 0.5173913 0.5458716 0.5979899 0.4958333 0.5042373 0.5265487 0.5833333 0.5586854 0.5560748 0.5534884 0.5891089 0.5776699
[27] 0.5748792 0.5509259 0.5721154 0.6040609 0.5639810 0.5509259 0.5920398 0.5920398 0.5891089 0.5891089 0.5950000 0.5950000 0.5920398
[40] 0.5534884 0.6102564 0.5433790 0.4857143 0.5288889 0.4666667 0.4979079 0.5336323 0.4917355 0.5063830 0.5804878 0.5748792 0.5458716
[53] 0.5483871 0.6329787 0.5950000 0.6102564 0.5360360 0.6040609 0.5151515 0.5312500 0.5312500 0.5312500 0.5288889 0.5666667 0.5384615
[66] 0.5920398 0.5891089 0.6165803 0.5748792 0.5776699 0.5085470 0.5042373 0.5721154 0.5458716 0.5336323 0.4541985 0.4798387 0.5173913
[79] 0.5483871 0.5586854 0.6467391 0.5666667 0.5458716 0.5666667 0.5384615 0.6040609 0.5639810 0.4559387 0.5063830 0.4541985 0.5360360
[92] 0.5242291 0.5534884 0.4541985 0.5666667 0.6230366 0.6134021 0.5776699 0.5173913 0.5586854 0.5613208 0.4779116 0.4958333 0.4296029
[105] 0.4979079 0.5196507 0.5219298 0.5129310 0.4722222 0.4837398 0.5219298 0.5776699 0.5384615 0.5950000 0.5360360 0.4917355 0.5586854
[118] 0.5613208 0.5804878 0.6263158
$centralization
[1] 0.2271654
$theoretical_max
[1] 59.24895
hist(centr_clo(occ_SUF_net, mode="all", normalized=T)$res, main='Centrality scores',xlab="The node-level centrality scores.")
# All centralisation types
centr_degree(occ_SUF_net)$centralization
[1] 0.3959384
centr_clo(occ_SUF_net, mode = "all")$centralization
[1] 0.2271654
centr_betw(occ_SUF_net, directed = FALSE)$centralization
[1] 0.02800183
centr_eigen(occ_SUF_net, directed = FALSE)$centralization
[1] 0.5825057